智能论文笔记

UltraProp: Principled and Explainable Propagation on Large Graphs

Meng-Chieh Lee , Shubhranshu Shekhar , Jaemin Yoo , Christos Faloutsos

分类：机器学习

2022-12-31

Given a large graph with few node labels, how can we (a) identify the mixed network-effect of the graph and (b) predict the unknown labels accurately and efficiently? This work proposes Network Effect Analysis (NEA) and UltraProp, which are based on two insights: (a) the network-effect (NE) insight: a graph can exhibit not only one of homophily and heterophily, but also both or none in a label-wise manner, and (b) the neighbor-differentiation (ND) insight: neighbors have different degrees of influence on the target node based on the strength of connections. NEA provides a statistical test to check whether a graph exhibits network-effect or not, and surprisingly discovers the absence of NE in many real-world graphs known to have heterophily. UltraProp solves the node classification problem with notable advantages: (a) Accurate, thanks to the network-effect (NE) and neighbor-differentiation (ND) insights; (b) Explainable, precisely estimating the compatibility matrix; (c) Scalable, being linear with the input size and handling graphs with millions of nodes; and (d) Principled, with closed-form formula and theoretical guarantee. Applied on eight real-world graph datasets, UltraProp outperforms top competitors in terms of accuracy and run time, requiring only stock CPU servers. On a large real-world graph with 1.6M nodes and 22.3M edges, UltraProp achieves more than 9 times speedup (12 minutes vs. 2 hours) compared to most competitors.

translated by 谷歌翻译

Unsupervised Machine Learning for Explainable Medicare Fraud Detection

Shubhranshu Shekhar , Jetson Leder-Luis , Leman Akoglu

分类：机器学习

2022-11-05

The US federal government spends more than a trillion dollars per year on health care, largely provided by private third parties and reimbursed by the government. A major concern in this system is overbilling, waste and fraud by providers, who face incentives to misreport on their claims in order to receive higher payments. In this paper, we develop novel machine learning tools to identify providers that overbill Medicare, the US federal health insurance program for elderly adults and the disabled. Using large-scale Medicare claims data, we identify patterns consistent with fraud or overbilling among inpatient hospitalizations. Our proposed approach for Medicare fraud detection is fully unsupervised, not relying on any labeled training data, and is explainable to end users, providing reasoning and interpretable insights into the potentially suspicious behavior of the flagged providers. Data from the Department of Justice on providers facing anti-fraud lawsuits and several case studies validate our approach and findings both quantitatively and qualitatively.

translated by 谷歌翻译

Benefit-aware Early Prediction of Health Outcomes on Multivariate EEG Time Series

Shubhranshu Shekhar , Dhivya Eswaran , Bryan Hooi , Jonathan Elmer , Christos Faloutsos , Leman Akoglu

分类：机器学习

2021-11-11

鉴于ICU（重症监护股）监测心脏病患者，用于大脑活动，我们如何尽早预测其健康结果？早期决策在许多应用中至关重要，例如，监测患者可能有助于早期干预和改善护理。另一方面，EEG数据的早期预测造成了几个挑战：（i）早期准确性权衡;观察更多数据通常会提高精度，但牺牲了，（ii）大规模（用于训练）和流传输（在线决策）数据处理，（iii）多变化（由于多个电极）和多长度（由于变化患者的逗留时间）时间序列。通过这种现实世界的应用程序，我们提供了从早期预测中耗尽的受益者，以及从错误分类到统一的区域特定目标中的成本。统一这两种数量允许我们直接估计单个目标（即益处），重要的是，准确地指示输出预测的时间：当益处估计变为肯定时。 Eventitter（a）是高效且快速的，在输入序列的数量中具有训练时间线性，并且可以实时运行以进行决策，（b）可以处理多变化和可变长度的时间序列，适用于患者数据和（c）是有效的，与竞争对手相比，提供高达2倍的时间，具有相同或更好的准确性。

translated by 谷歌翻译

gen2Out: Detecting and Ranking Generalized Anomalies

Meng-Chieh Lee , Shubhranshu Shekhar , Christos Faloutsos , T. Noah Hutson , Leon Iasemidis

分类：机器学习

2021-09-06

在M维数据点的云中，我们将如何发现，以及排名，单点和群体 - 异常？我们是第一个概括了两个维度的异常检测：第一维度是我们在统一的观点下处理点异常，以及组异常 - 我们将把它们称为广义异常。第二维度不仅可以检测到，而且还可以在可疑顺序中排名，但也排名，异常。异常检测和排名具有许多应用：例如，在癫痫患者的脑电图中，异常可能表明癫痫发作;在计算机网络流量数据中，它可能表示电源故障或DOS / DDOS攻击。我们首先设置一些合理的公理;令人惊讶的是，早期的方法都没有通过所有公理。我们的主要贡献是Gen2Out算法，具有以下理想的性质：（a）所指的原理和声音异常评分，使得探测器的公理组合，（b）倍增，在其检测到，以及排名的级别点和组异常，（c）可扩展，它是快速且可伸缩的，线性输入大小。（d）有效，关于现实世界癫痫记录（200GB）的实验证明了临床医生证实Gen2Out的有效性。在27个现实世界基准数据集上的实验表明，GEN2OUT检测到准确性的地面真理组，匹配或优于点异常基线基线算法，没有对组异常的竞争，并且在储运机上需要大约2分钟的数据点。

translated by 谷歌翻译

Interactive Control over Temporal-consistency while Stylizing Video Streams

Sumit Shekhar , Max Reimann , Moritz Hilscher , Amir Semmo , Jürgen Döllner , Matthias Trapp

分类：计算机视觉

2023-01-02

With the advent of Neural Style Transfer (NST), stylizing an image has become quite popular. A convenient way for extending stylization techniques to videos is by applying them on a per-frame basis. However, such per-frame application usually lacks temporal-consistency expressed by undesirable flickering artifacts. Most of the existing approaches for enforcing temporal-consistency suffers from one or more of the following drawbacks. They (1) are only suitable for a limited range of stylization techniques, (2) can only be applied in an offline fashion requiring the complete video as input, (3) cannot provide consistency for the task of stylization, or (4) do not provide interactive consistency-control. Note that existing consistent video-filtering approaches aim to completely remove flickering artifacts and thus do not respect any specific consistency-control aspect. For stylization tasks, however, consistency-control is an essential requirement where a certain amount of flickering can add to the artistic look and feel. Moreover, making this control interactive is paramount from a usability perspective. To achieve the above requirements, we propose an approach that can stylize video streams while providing interactive consistency-control. Apart from stylization, our approach also supports various other image processing filters. For achieving interactive performance, we develop a lite optical-flow network that operates at 80 Frames per second (FPS) on desktop systems with sufficient accuracy. We show that the final consistent video-output using our flow network is comparable to that being obtained using state-of-the-art optical-flow network. Further, we employ an adaptive combination of local and global consistent features and enable interactive selection between the two. By objective and subjective evaluation, we show that our method is superior to state-of-the-art approaches.

translated by 谷歌翻译

A Permutation-Free Kernel Independence Test

Shubhanshu Shekhar , Ilmun Kim , Aaditya Ramdas

分类：机器学习 | (统计)机器学习

2022-12-18

In nonparametric independence testing, we observe i.i.d.\ data $\{(X_i,Y_i)\}_{i=1}^n$, where $X \in \mathcal{X}, Y \in \mathcal{Y}$ lie in any general spaces, and we wish to test the null that $X$ is independent of $Y$. Modern test statistics such as the kernel Hilbert-Schmidt Independence Criterion (HSIC) and Distance Covariance (dCov) have intractable null distributions due to the degeneracy of the underlying U-statistics. Thus, in practice, one often resorts to using permutation testing, which provides a nonasymptotic guarantee at the expense of recalculating the quadratic-time statistics (say) a few hundred times. This paper provides a simple but nontrivial modification of HSIC and dCov (called xHSIC and xdCov, pronounced ``cross'' HSIC/dCov) so that they have a limiting Gaussian distribution under the null, and thus do not require permutations. This requires building on the newly developed theory of cross U-statistics by Kim and Ramdas (2020), and in particular developing several nontrivial extensions of the theory in Shekhar et al. (2022), which developed an analogous permutation-free kernel two-sample test. We show that our new tests, like the originals, are consistent against fixed alternatives, and minimax rate optimal against smooth local alternatives. Numerical simulations demonstrate that compared to the full dCov or HSIC, our variants have the same power up to a $\sqrt 2$ factor, giving practitioners a new option for large problems or data-analysis pipelines where computation, not sample size, could be the bottleneck.

translated by 谷歌翻译

NBC-Softmax : Darkweb Author fingerprinting and migration tracking

Gayan K. Kulatilleke , Shekhar S. Chandra , Marius Portmann

分类：机器学习 | 人工智能 | 自然语言处理

2022-12-15

Metric learning aims to learn distances from the data, which enhances the performance of similarity-based algorithms. An author style detection task is a metric learning problem, where learning style features with small intra-class variations and larger inter-class differences is of great importance to achieve better performance. Recently, metric learning based on softmax loss has been used successfully for style detection. While softmax loss can produce separable representations, its discriminative power is relatively poor. In this work, we propose NBC-Softmax, a contrastive loss based clustering technique for softmax loss, which is more intuitive and able to achieve superior performance. Our technique meets the criterion for larger number of samples, thus achieving block contrastiveness, which is proven to outperform pair-wise losses. It uses mini-batch sampling effectively and is scalable. Experiments on 4 darkweb social forums, with NBCSAuthor that uses the proposed NBC-Softmax for author and sybil detection, shows that our negative block contrastive approach constantly outperforms state-of-the-art methods using the same network architecture. Our code is publicly available at : https://github.com/gayanku/NBC-Softmax

translated by 谷歌翻译

UNet Based Pipeline for Lung Segmentation from Chest X-Ray Images

Shashank Shekhar , Ritika Nandi , H Srikanth Kamath

分类：计算机视觉 | 机器学习

2022-12-09

Biomedical image segmentation is one of the fastest growing fields which has seen extensive automation through the use of Artificial Intelligence. This has enabled widespread adoption of accurate techniques to expedite the screening and diagnostic processes which would otherwise take several days to finalize. In this paper, we present an end-to-end pipeline to segment lungs from chest X-ray images, training the neural network model on the Japanese Society of Radiological Technology (JSRT) dataset, using UNet to enable faster processing of initial screening for various lung disorders. The pipeline developed can be readily used by medical centers with just the provision of X-Ray images as input. The model will perform the preprocessing, and provide a segmented image as the final output. It is expected that this will drastically reduce the manual effort involved and lead to greater accessibility in resource-constrained locations.

translated by 谷歌翻译

Hybrid Model using Feature Extraction and Non-linear SVM for Brain Tumor Classification

Lalita Mishra , Shekhar Verma , Shirshu Varma

分类：计算机视觉

2022-12-06

It is essential to classify brain tumors from magnetic resonance imaging (MRI) accurately for better and timely treatment of the patients. In this paper, we propose a hybrid model, using VGG along with Nonlinear-SVM (Soft and Hard) to classify the brain tumors: glioma and pituitary and tumorous and non-tumorous. The VGG-SVM model is trained for two different datasets of two classes; thus, we perform binary classification. The VGG models are trained via the PyTorch python library to obtain the highest testing accuracy of tumor classification. The method is threefold, in the first step, we normalize and resize the images, and the second step consists of feature extraction through variants of the VGG model. The third step classified brain tumors using non-linear SVM (soft and hard). We have obtained 98.18% accuracy for the first dataset and 99.78% for the second dataset using VGG19. The classification accuracies for non-linear SVM are 95.50% and 97.98% with linear and rbf kernel and 97.95% for soft SVM with RBF kernel with D1, and 96.75% and 98.60% with linear and RBF kernel and 98.38% for soft SVM with RBF kernel with D2. Results indicate that the hybrid VGG-SVM model, especially VGG 19 with SVM, is able to outperform existing techniques and achieve high accuracy.

translated by 谷歌翻译

Automated anomaly-aware 3D segmentation of bones and cartilages in knee MR images from the Osteoarthritis Initiative

Boyeong Woo , Craig Engstrom , William Baresic , Jurgen Fripp , Stuart Crozier , Shekhar S. Chandra

分类：计算机视觉 | 机器学习

2022-11-30

In medical image analysis, automated segmentation of multi-component anatomical structures, which often have a spectrum of potential anomalies and pathologies, is a challenging task. In this work, we develop a multi-step approach using U-Net-based neural networks to initially detect anomalies (bone marrow lesions, bone cysts) in the distal femur, proximal tibia and patella from 3D magnetic resonance (MR) images of the knee in individuals with varying grades of osteoarthritis. Subsequently, the extracted data are used for downstream tasks involving semantic segmentation of individual bone and cartilage volumes as well as bone anomalies. For anomaly detection, the U-Net-based models were developed to reconstruct the bone profiles of the femur and tibia in images via inpainting so anomalous bone regions could be replaced with close to normal appearances. The reconstruction error was used to detect bone anomalies. A second anomaly-aware network, which was compared to anomaly-na\"ive segmentation networks, was used to provide a final automated segmentation of the femoral, tibial and patellar bones and cartilages from the knee MR images containing a spectrum of bone anomalies. The anomaly-aware segmentation approach provided up to 58% reduction in Hausdorff distances for bone segmentations compared to the results from the anomaly-na\"ive segmentation networks. In addition, the anomaly-aware networks were able to detect bone lesions in the MR images with greater sensitivity and specificity (area under the receiver operating characteristic curve [AUC] up to 0.896) compared to the anomaly-na\"ive segmentation networks (AUC up to 0.874).

translated by 谷歌翻译